40 research outputs found

    Digitizing Historical Forest Service Data

    Get PDF
    When ecologists are working in the field, they often record their data on datasheets by hand. This hard-won information then tends to remain trapped in physical copies of datasheets which then get stored into filing cabinets, preventing further analysis. We are collaborating with the Sawtooth National Forest Service, which has collected decades of data on historical vegetation and soil conditions in the Sun Valley, Idaho area to digitize their historical data. The goal of this project is to create an Optical Character Recognition (OCR) model able to process the collected handwritten datasheets and generate a digitized version of them. By making nearly a century of environmental data ready for statistical analysis, this project will allow Forest Service and BSU scientists to answer important questions about how some of Idaho\u27s most spectacular landscapes have been affected by climate change, sheep grazing, and natural resource management decisions across areas and timeframes that were previously impractical to tackle

    Supporting Climate Research using Named Data Networking

    Get PDF
    Abstract-Climate and other big data applications face substantial problems in terms of data storage, retrieval, sharing and management. While several community repositories and tools are available to help with climate data, these problems still persist and the community is actively looking for better solutions. In this project we apply NDN to support climate modeling applications. The information-centric nature of NDN, where content becomes a first class entity, simplifies many of the problems in this domain. NDN offers lightweight data publication, discovery and retrieval compared to IP-based solutions. However, introducing a new network architecture to a mature domain that routinely produces petabytes of datasets and a plethora of assorted tools to manipulate them, is a risky proposition. The advantages of NDN alone may not be sufficient to overcome the natural inertia. Our approach is to introduce NDN while carefully avoiding undue disruption to existing workflows. To that extent we employ a user interface that employs familiar filesystem operations to publish, discover and retrieve data, integrated with domain-specific translators that automatically convert and publish datasets as NDN objects. We outline the advantages of NDN in this application domain and the challenges we faced during the adaptation. We believe this is the first exercise in applying NDN in an existing, large, mature application domain

    Managing scientific data with named data networking

    Get PDF
    Many scientific domains, such as climate science and High Energy Physics (HEP), have data management requirements that are not well supported by the IP network architecture. Named Data Networking (NDN) is a new network architecture whose service model is better aligned with the needs of data-oriented applications. NDN provides features such as best-location retrieval, caching, load sharing, and transparent failover that would otherwise be painstakingly (re-)implemented by each application using point-to-point semantics in an IP network. We present the first scientific data management application designed and implemented on top of NDN. We use this application to manage climate and HEP data over a dedicated, high-performance, testbed. Our application has two main components: a UI for dataset discovery queries and a federation of synchronized name catalogs. We show how NDN primitives can be used to implement common data management operations such as publishing, search, efficient retrieval, and publication access control

    Identifying and Scheduling Loop Chains Using Directives

    Get PDF
    Exposing opportunities for parallelization while explicitly managing data locality is the primary challenge to porting and optimizing existing computational science simulation codes to improve performance and accuracy. OpenMP provides many mechanisms for expressing parallelism, but it primarily remains the programmer’s responsibility to group computations to improve data locality. The loopchain abstraction, where data access patterns are included with the specification of parallel loops, provides compilers with sufficient information to automate the parallelism versus data locality tradeoff. In this paper, we present a loop chain pragma and an extension to the omp for to enable the specification of loop chains and high-level specifications of schedules on loop chains. We show example usage of the extensions, describe their implementation, and show preliminary performance results for some simple examples
    corecore